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Abstract 

Precoding with block diagonalization is an attractive scheme for approaching sum capacity in 
multiuser multiple input multiple output (MIMO) broadcast channels. This method requires either 
global channel state information at every receiver or an additional training phase, which demands 
additional system planning. In this paper we propose a lattice based multi-user precoder that uses block 
diagonalization combined with pre-equalization and perturbation for the multiuser MIMO broadcast 
channel. An achievable sum rate of the proposed scheme is derived and used to show that the proposed 
technique approaches the achievable sum rate of block diagonalization with water-filling but does 
not require the additional information at the receiver. Monte Carlo simulations with equal power 
allocation show that the proposed method provides better bit error rate and diversity performance than 
block diagonalization with a zero-forcing receiver. Additionally, the proposed method shows similar 
performance to the maximum likelihood receiver but with much lower receiver complexity. 
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I. Introduction 

Recent information theoretic work on multiple-input multiple-output (MIMO) communication 
has shown that the sum capacity, the maximum sum rate in the broadcast channel, is achieved by 
dirty paper coding (DPC) [1]. The key idea of DPC is to pre-cancel interference at the transmitter 
using perfect channel state information (CSI) and complete knowledge of the transmitted signals. 
DPC, while theoretically optimal, is an information theoretic concept that has proven to be 
difficult to implement in practice. Consequently, several practical near-DPC techniques based on 
the concept of precoding have been proposed that offer different tradeoffs between complexity 
and performance [7]-[22]. 

One of the simplest approaches for multiuser precoding is to premultiply the transmitted signal 
by a suitably normalized zero-forcing (ZF) or minimum mean squared error (MMSE) inverse of 
the multiuser matrix channel [7], [8]. The gap in the sum rate between DPC and these linear 
precoding schemes, however, is quite large due to the transmit power enhancement resulting 
from power normalization. 

A means of avoiding transmit power enhancement is to use non-linear precoding, or lattice 
precoding [9]-[14], where a modulo operation or vector perturbation is used to reduce transmit 
power enhancement. The main idea is that an extended constellation is used at the transmitter 
with multiple equivalent points with the original points in the fundamental constellation boundary. 
The modulo operation finds a proper point in the fundamental boundary equivalent with a 
distorted point that the original point moves to in the extended region by power normalization. 
Tomlinson-Harashima MIMO precoding is one example of transmit precoding with a modulo 
operation [9], [10]. Another example is vector perturbation where the transmit signal vector is 
perturbed by another vector to minimize transmit power from the extended constellation [12]. 
Finding the optimal perturbation involves solving a minimum distance type problem and thus can 
be implemented using sphere-encoding or other full search based algorithms, which still have 
moderate complexity. Lower complexity alternatives include lattice-reduction aided broadcast 
precoding, which uses the Lenstra-Lenstra-Lovasz (LLL) algorithm [13], and a simple vector 
approximation based on Rayleigh-Ritz theorem [14]. These vector perturbing schemes enable 
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a simple receiver structure via a modulo operation [15]. The multi-user precoding approaches 
mentioned above assume that the transmitter sends a single stream per each user; in this paper, 
we consider precoding schemes for multiple stream transmission to increase each user's peak 
rate in multi-user links. 

An alternative to implementing DPC is block diagonalization (BD), which supports multiple 
stream transmission as well [16]— [19]. The basic concept of BD consists of using special transmit 
vectors that ensure zero interference between users but do not completely invert the channel. 
The resulting multiuser MIMO channel matrix has a block diagonal form thus each user can 
apply a standard point-to-point MIMO receiver. Unlike the aforementioned inverse techniques, 
BD still requires equalization at the receiver but suffers less from noise enhancement. When user 
channels are mutually orthogonal, BD achieves the same sum capacity as DPC [20]. The main 
challenge with BD is that unlike inverse or nonlinear methods, either global CSI is required at 
all the receivers (obtained through an iterative update for example [18]) or an additional training 
phase is needed so that each user can estimate their equivalent channel and perform detection 
[22]. 

In this paper, we present a lattice-based non-linear precoding scheme that supports multiple 
stream transmission in a multiuser broadcast channel. All prior approaches mentioned above 
assume at a minimum complete and perfect CSI at the transmitter. We make the same assumption 
in this paper. Our proposed scheme exploits the BD linear precoding algorithm to transmit 
interference free groups of data to different users. To avoid the need for a complex receiver, 
however, we further use a ZF prefilter combined with a multi-stream vector perturbation to avoid 
the corresponding power enhancement. The main features of our approach is that (i) we do not 
require global CSI at the receiver or an additional training phase and (ii) our approach has much 
lower receiver complexity, at the expense of additional transmit complexity over BD. 

We derive the achievable rate of our system under an optimal perturbation assumption. An 
achievable rate is an error-free supportable rate that satisfies any given power constraint [23]. 
In our numerical results, we show that the resulting rate is equivalent to that of BD combined 
with water- filling under an equal power constraint for each user [19], [20]. We also compare the 
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proposed algorithm with previously proposed BD assuming equal power allocation [19] and a 
ZF or maximum likelihood (ML) receiver [16], [18] in terms of the uncoded bit error rate. We 
find that our approach has similar diversity performance to BD with an ML receiver and much 
better performance than BD with a ZF receiver. Thus, from both rate and diversity perspectives, 
our approach achieves similar performance to BD with an optimal receiver but with much lower 
receiver complexity. This is a particular advantage in multiuser systems with low-cost low-power 
mobile users. 

This paper is organized as follows. In Section |TT| we begin with the system model and present 
a summary of BD and its limitations HH In Section |HI| we propose lattice-based precoding with 
the BD algorithm to support multiple stream transmission and derive its achievable rate. We 
present numerical results including achievable rate, probability of symbol error, and complexity 
in [IV] and conclude in Section |Vj 

II. Broadcast MIMO System with Block Diagonalization 

In this section we discuss the narrow-band broadcast signal and channel model under consid- 
eration. Then we discuss block diagonalization and its limitations. 

A. Notation 

• Let A denote a complex matrix, and A T , A H , and A -1 denote the transpose , conjugate 
transpose and pseudo-inverse of A, respectively. 

• (a); and (A)^ m j denote the I th element of vector a and the (l,m) th element matrix A, 
respectively. 

• diag(ai, a 2 , • • • , a n ) denotes a n x n diagonal matrix with diag(ai, a 2 , • • • , a n )ij = a;. 

• For amxm matrix A h A = diag(A!, • • • , A„) denotes a mn x mn block diagonal matrix 
represented by 

Ax 

A = 

• The trace ofamxm square matrix A is expressed as Tr(A) = YlT=i A(z,z)- 
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• The Frobenius norm of a m x n matrix A is ||A|||. = Tr(AA H ). 

B. MIMO Broadcast Signal Model 

Consider the MIMO broadcast signal model with K users each employing Nr receive antennas 
and each receiving their own data streams manipulated by a precoder at the base station with 
N T antennas as shown in Fig. [lj We assume that the channel is flat fading and for the purpose 
of simulations we model the elements of each user's channel matrix as independent complex 
Gaussian random variables with zero mean and unit variance. Such a narrow-band flat fading 
model is reasonable in future MIMO systems, for example, via orthogonal frequency division 
multiplexing (OFDM); however, we defer a detailed discussion of OFDM to future work. Let x fc , 
H fc , and n k denote the k th transmit signal vector, the channel from the base station to user k, and 
the thermal noise at user k, respectively. The noise n k represents additive white Gaussian noise 
with variance a\. In the broadcast channel, since the interference of the other users propagates 
in the desired user's channel, the received signal at the k th receiver is thus 

K 

y fc = H fc M fc x fc + H fc J2 Mjxj + nfc, (1) 
where M; denotes the precoder for the I th user [18]— [22]. 



C. Block Diagonalization and Its Limitations 

In [18] and [19], the authors choose such that the subspace spanned by its columns lies 
in the null space of H ; (V/ ^ k), that is, H,M fc = for I = 1, • • • ,K-1,K + 1,--- , K. If we 
define H fc as 



H 



fc-i 



(2) 



then M fc can be obtained by calculating the null space of H fc . Let us define the SVD of H fc as 

n1 H 



FL 



UfcA fc 



V 



(i) 



V 



(0) 



(3) 



where and A& denote the left singular vector matrix and the matrix of ordered singular 
values of Hjt, respectively. Matrices Vj^ and denote the right singular matrices each 
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consisting of the singular vectors corresponding to non-zero singular values and zero singular 
values, respectively. Note that HjV^ = (V7 ^ k). Assuming each mobile station has N Rjk 
receive antennas, then the k th user can receive L k < N Rjk streams. Since the columns of Vj^ 
span the null space of H fe , constructing M. k using a linear combination of L k columns of 
will automatically satisfy the zero-interference constraints. The specific precoder chosen depends 
on additional capacity or bit error rate considerations. Assuming that the channel matrices are 
full rank (which occurs with probability one in complex Gaussian channels), the base station 
requires that the number of transmit antennas, N T , is at least Yla=i i^k Nr,i + L k to ensure there 
are at least L k columns in each v[°^ and thus satisfy the dimensionality constraint required to 
cancel interference [19], [21]. 

When excess transmit antennas are available, i.e., N T > Ylf=i i^k Nr,i + L k , it is possible to 
improve BD using transmit antenna selection or eigenmode selection [21]. In addition, when more 
receive antennas than the number of transmit streams are available, receive antenna selection can 
further improve BD [20]. In this paper, for notational and analytical simplicity, we assume that 
every user has the same number of receive antennas N R , the number of transmit data streams 
makes full use of the receive antennas L k = N R , and the number of transmit antennas exactly 
satisfies the dimensionality constraint N T = J2k=i Lk- 

After pre-canceling the interference of the other users thanks to the precoder M fc , the received 
signal of the k th receiver, y k is given by 

y k = H e//)fc x fc + n fc , (4) 

where H e ff,k — H^M^ denotes the effective channel of the k th user. Since the k th user receives 
its own data stream without interference from other users, the methodology for designing an 
appropriate decoder is similar to that for single user MIMO cases after channel estimation [16], 
[18]. Note that we cannot use a common pilot for estimating H e ff,k since each user uses a 
different precoding filter M. k and thus H e ff,k consists of the precoding filter as well as the raw 
channel H fc [22]. This means that either an additional training phase or global CSI at the receiver 
is needed. 

To achieve the highest sum rate, after removing the effect of the interfering users' streams, BD 
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maximizes the data throughput with the well-known water- filling (WF) algorithm [19]. Define 
the SVD of Ueff,h 



where Vj., denotes the set of the right singular vectors corresponding to non-zero singular values 
and Ufe is the left singular matrix. Assume that is a diagonal matrix whose elements scale 
the power transmitted into each of the column of M&. When the precoder of the k th user M& 
is given by 



at the transmitter and the decoder of the k user also has Ujt at the receiver, the data stream of 
the k th user is received without the effect of multi-user interference and the maximum achievable 
sum rate of the BD algorithm, C B d, is given by 



where A=diag(A 1; • • • ,A K ), Q=diag(Qi, • • • ,Qk), and Q(> 0) denotes the optimal power 
loading subject to a total power constraint Pt [19]. The optimization problem in © is a standard 
WF problem over the eigenvalues of the equivalent channels [20]. Note that BD achieves the 
sum capacity achieved by DPC when the user channels are orthogonal (see Lemma 1, [20]). 

To implement the WF algorithm, knowledge of the decoding filter Ufc is required at each 
receiver. The decoding filter though depends on H e ff )k , but H e ff,k also consists of the 
original channel matrix and the nulling matrix Vj£ . Because the nulling matrix is calculated 
by using partial information about the channel state information of other users', the receiver 
needs to either calculate the decoding filter directly from the estimated channel of H e ff,k [16] 
or the transmitter can send some information to calculate at the k th receiver [22]. 

In the first method, all receivers have to estimate their effective channel including precoding 
followed by the physical channel. A receiver can estimate the effective channel if dedicated 
pilot sequences, different for every receiver, are used in the system and precoded by the same 
transformation. The common pilot, however, is still required, so the first method may increase the 
control channel overhead. Alternatively, in order to use only common pilot for channel estimation, 



H e//ife = U fc [A fc 0] V^VJ 



(5) 



(6) 




(7) 
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the transmitter can broadcast the appropriate receiver information to each user [22]. Note that this 
approach also increase system overhead and new scheme which does not require any coordination 
information is needed. Therefore, the required coordination is the main limitation of the BD 
algorithm: all receivers should have global channel state information to generate their own receive 
filters to approach Cbd in ©. 

III. Lattice-Based Broadcast SM-MIMO Precoding System 

We introduce a lattice-based (LB) multi-user (MU) spatial multiplexing (SM) MIMO precoding 
system to support multiple stream transmission in a broadcast channel without any coordination 
information in this section. Combining BD and perturbation algorithms, we provide a smart 
solution to avoid coordination information as well as to transmit multiple streams for each user. 
The proposed scheme requires a simple decoder at each user receiver containing primarily a 
modulo operation. In this section, we describe the perturbation and our proposed algorithm. 
Then, we present an achievable rate analysis of the proposed system under the assumption of 
an optimal perturbation. 

A. Perturbation Background 

Vector perturbation was introduced in [12] to prevent the transmit power enhancement that 
occurs when channel inversion using a ZF or MMSE prefilter is used at the transmitter [7]. Prior 
work considered perturbation in the case where = 1, i.e. the single stream case. To help 
explain our approach we summarize the vector perturbation concept here. 

Let H denote a K x N T multi-user channel matrix assuming each user has a single receive 
antenna. The idea of perturbation is to find a "perturbing" vector p from an extended constellation 
(ACZ K ) to minimize the transmitter power and p is chosen by solving 

p = arg min III! -1 (s + p') || 2 (8) 

where s is a modulated signal vector before perturbing, the scalar A is chosen depending on the 
original constellation size (we take A = 2 for 4-QAM), and CZ K denotes the i^-dimensional 
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complex lattice 1 [12], [13]. 

To illustrate, consider the set of equivalent points that form an extended constellation in Fig. [^J 
A symbol illustrated by the black-filled circle is an original symbol that lies in the fundamental 
region before pre-distortion by channel matrix inversion (H" 1 ). A set of points marked by the 
circle is used to represent symbols which are congruent to the symbol in the fundamental region. 
After pre-distortion, the resulting constellation region also becomes distorted and thus it takes 
more power to transmit the original point than before distortion because the black-filled point 
after pre-distortion is further from the origin (+) than before pre-distortion. Note that the set of 
points marked by the circle represents the same symbol. Among the equivalent points, if the 
transmitter sends the gray-filled circle point which is the one closest to the origin to minimize 
transmit power, the receiver finds its equivalent image inside the fundamental constellation region 
using a modulo operation and treats it as if the black-filled circle point is actually received [15]. 
Note that the modulo operation is a simply mapping procedure, e.g., any points marked by the 
circle can be mapped back to the point marked by the black-filled circle via the modulo operation 
at the receiver in Fig. |2j 

The problem of finding the nearest points from the extended constellation is a complex version 
of the K-dimensional integer-lattice least-squares problem [12]. Therefore, an exhaustive search 
is required to solve ©. It is possible to reduce the search complexity by using lattice reduction 
algorithms [13], [14]. 

B. Lattice-Based Multiuser Spatial Multiplexing MIMO Precoder Using BD 

The proposed lattice-based multi-user spatial multiplexing MIMO precoding system using the 
block diagonalization algorithm is illustrated in Fig. EJ The transmitter encodes each user's data 
streams independently. The k th transmitter consists of the cascade of two filters H^ff k ant ^ 
where the effective channel H e ff,k = H^M^ calculated as in © and the precoding matrix 

'in [12] and [13], the authors use 27^-dimensional lattice because they assume that the current realization of H is separated 
by real and imaginary values of H. In this paper, however, we use complex version of H and if-dimensional complex lattice 
defined as CZ K . 
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M fc = vf. (9) 



Note that in © has a different form from ©. As mentioned in Section III-C1 using the 
optimal BD solution with diagonalization via SVD requires additional coordination information 
since the effective channel used for the SVD operation includes CSI from other users. To avoid the 
additional coordination information, the precoder M. k has only to remove multiuser interference 

and the inversion of H e ff,k parallelize each user's stream instead of SVD. In addition, we do not 

i 

require transmit power optimization (Q|) as seen in © since we assume equal power allocation 
on each stream. 

To prevent transmit power enhancement due to H~^. fc , the proposed transceiver applies a 
perturbation to the transmitted signal vector to reduce the norm of the precoded signal vector 
for each user. The perturbation for user k is given by 



p k = arg min 1 1 H e h k (s k + p' k 
p' k eAcz L k JJ ' 



|2 



(10) 

= arg min ||HjL fc Sfe|| 

p' k eAcz L k JJ ' 

where s k and are the transmit signal vector of the k th user before perturbing and the perturbing 
vector of the k th user, respectively. Essentially we find the k th user's perturbing vector p^. from 
the set of L k -dimensional complex lattice points. Unlike work in [7], the proposed perturbation 
operates in stream domain not the user domain. The reason is that the block diagonalization 
eliminates multiuser interference. In the work in [7], the channel inverse taken on the multiuser 
channel matrix. In our case, it is taken only over the effective channel matrix after the block 
diagonalization step. 

Since the transmitter sends the pre-distorted symbol with a perturbation, the received signal 
is given by 

y k = s k + n k . (11) 

Note that the received signal in (ITTb consists of the perturbed symbol (s fe ) and AWGN vector. 
The receiver has only to map the perturbed symbol back to the original symbol (s k ) in the 
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fundamental region using modulo operations [15], and the estimated symbol of s fc is given by 

s fc = mod(y fc ) (12) 

where mod(-) denotes a modulo operation. As mentioned in the previous section, mod(-) results 
in a simple decoder at the receiver. 

C. Achievable Rate Analysis of MIMO Block Diagonalization with Perturbation 

The problem of achievable rate analysis is reduced to the single-user MIMO case thanks to the 
fact that the nulling matrix M fc removes multiuser interference. Recall the received signal model 
that uses the effective channel in ©. Define H e ff,k = U fc A fc Vf where U fc = [u x • • • u Lk ], 
V fc = [vi ■ • • v Lfe ] and A fc = diag (Ai • • ■ A L J. Then with r k = Ufy fc , t k = Vfx fe and 
Wjt = Uf n fc , it is possible to transform © into the equivalent signal model given by 

r fc = A fc t fc + w fc . (13) 



Define 7 as 



7 = ||H e / />fc s fc || 2 



&k ( H e//,fcH^- / fe ) Sfc 

s k U fc A fc U fe S fc 



1=1 

where = y and = | uf^ s fc | . Since the scalar 7 is the normalization factor of the transmit 
signal and Xfc = -^H~^ fc s&, the equivalent transmit signal is given by 

tfc = -Lvf VfcA^Uf Sfc (15) 

Substituting ([131 for t k in ([T^l . the stream-wise form of 17. is given by 

(rfc), = -^uf ^ + (w fc ), • (16) 

Now suppose that E{||xfc|| 2 } = P in ©. From (fTBI) . the received signal to noise ratio (SNR) 
of each stream, SNR h can be represented by 

SNRt = — = , (17) 

7 V k u 2 f 2 
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where p — Therefore, the achievable rate of the k th user, R k , is given by 

R k = Y,^g{l + SNR l ) 
1=1 

L k ( 



1=1 



Note that perturbation is not applied yet in calculating (fTSt . Perturbing means that we force 
the perturbing vector p fc to minimize 7 and generate s k that can only be coarsely oriented in 
the coordinate system defined by ui, • • • , u.L k [12]. Therefore, from (fT%b . if we find the proper 
perturbing vector and control £ m to minimize the normalized factor 7, then we can obtain the 
achievable rate of the proposed scheme 

L„ ( 

Rk,prop ^ ^ lo& ' / \ 

mil1 (Em=l ^mC 



From (fT9l) . we need to solve for 



1 + p— ^ r- I . (19) 



Cm = arg rnin ^ /4>£m . (20) 

Sm 

m=l 



By the Cauchy-Schwartz inequality, solution of (l20b occurs when 

^i 2 = -- = ^eL = ^o (2D 

for an arbitrary constant uj^. Therefore, we calculate the achievable rate Rk, pr o P as 

Riprap + ^ 

= log (l + jjjfe) (23) 



We obtain (j24j) substituting /i/ for Note that the solution of (l2"TT) is valid when the lattice size 
is infinite because is the relative variable of the perturbed symbol vector Sfc and we would find 
the proper s k provided that the search range of the lattice is infinite. That is, the perturbation 
finds the perturbed symbol that is the closest point to the origin and also controls the power 
factor to minimize the transmit symbol power. 
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In addition, (l24b is the same expression as the capacity of the single user (SU) MIMO system 
with equal power allocation, Ceq [24], [25]. Assuming that the elements of each user's channel 
matrix are identically and independently distributed (i.i.d.), Ceq approaches the capacity of the 
MIMO system with WF power allocation, Cwf for asymptotically high SNR [26]. Thus we 
conclude that the achievable rate of the k th user, Rk )P ro P obtained by the LB MU SM-MIMO 
precoding system approaches the optimal capacity that the WF algorithm achieves in a single user 
MIMO system asymptotically for high SNR, and that the achievable sum rate of the proposed 
scheme , R SU m, is defined by 

K 



k ~ l (25) 

K v ' 



J2 C WF,k (for high SNR), 



fc=i 

where CwF,k denotes the achievable rate that each user can approach with the WF algorithm. Note 
that J2k=i C\yF,k is a specific solution of © with a per-user power constraint that Tr(S fc ) < P 
and KP = P T . Consequently, assuming that each user equally uses the transmit power P, which 
is called equal power constraint, this sum rate is the same as the achievable sum rate Cbd a s 
shown in ©. 

IV. Numerical Experiments and Results 

In this section we compare the sum rate and BER performance of the proposed LB MU 
SM-MIMO scheme and other various schemes through Monte Carlo simulations. To verify the 
performance of the proposed LB MU SM-MIMO precoding system, we consider several special 
cases. For simplicity, without breaking the dimensionality constraints as mentioned in Section 



III-CI we assume that the number of receive antennas for each user is equal to N R and that 
each user receives the same number of streams (L^) as the number of receive antennas, that is, 
Lk = N R and that N T = KN R . We use the notation {N R , K} to index the number of each 
user's antennas and the number of users. We assume that the elements of each user's channel 
matrix are independent complex Gaussian random variables with zero mean and unit variance 
for all numerical results. 
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A. Achievable Sum Rate Comparison 

We compare the achievable rate of the proposed scheme with the sum capacity, the achievable 
sum rate with the block diagonalization and the power allocation algorithms and the sum rate 
of the channel inversion without perturbation. 

The sum capacity, C sum , denotes the maximum sum rate that can be achieved by DPC [6] 
given by 



C sum = max lo 

{B*:EkLi Tr(R fe )<F T } 



gdet (l+4^H^R fc H fc ] , (26) 

V ° n k=i J 



where Rfc(> 0) is the signal covariance matrix for user k in the dual multiple access channel 
[3], [6]. 

We obtain the achievable sum rate of Cbd from ©. For the channel inversion scheme, 
each user exploits the ZF algorithm for precoding after using the nulling matrix and block 
diagonalization with the constraint that equal power is transmitted to each user's receive antennas 
without perturbation. To obtain the achievable sum rate of the channel inversion method, the 
transmit power should be normalized to satisfy the power constraint. Therefore, the achievable 
sum rate of the channel inversion method, Rci, is defined by [19] [27] 

P 



K 



Rci = J^logdet 1 + 



w 1 II 2 

"-eff,k\\ F 




(27) 



where we recall that [i\ is the inverse of the I singular value, A;. 

Fig. ID compares the achievable sum rate of the proposed system with the other systems in the 
case of {2, 2}. From Fig. El we observe that the sum rate of the proposed scheme is better than 
that of the channel inversion scheme and also achieves the sum rate of the BD scheme with WF 
algorithm asymptotically for high SNR as we expected in Section IIII-CI without any additional 
coordination information and iterative updates for implementing the precoding and decoding 
filters. The sum rate of the channel inversion scheme is degraded by the power normalization 
from transmit precoding. The proposed scheme exploits the perturbation as a form of power 
allocation to compensate for the degradation of power normalization. 
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Fig. 13 and Fig. |6] show the achievable sum rate performance according to the number of 
transmit data streams and the number of users, respectively. The sum rate of proposed scheme 
linearly increases as the number of transmit antennas and the number of users increase, respec- 
tively. We observe that performance gap between the proposed scheme and the BD scheme with 
WF increases as the number of transmit data streams in Fig. |5J The performance gap between 
Rsum and Cbd resulted from the assumption that the proposed scheme uses equal power and 
same constellation for each transmit antenna. Note that the achievable rate of proposed scheme 
approaches the sum rate of the specific case of Cbd mentioned in Section IIII-C1 In Fig. 
we observe the same tendency that performance gap between the proposed scheme and the BD 
scheme with the WF increase as the number of users, which is because the number of transmit 
antennas increases as the number of users increases in this simulation. 

B. BER performance and Diversity Gain 

In this section we compare the BER performance and diversity gain of the proposed scheme 
and the other schemes according to several configurations of the receiver antennas and users 
under assumption that all schemes use equal power and same constellation for each transmit 
antenna. 

The other schemes include ZF MU SM-MIMO, ZF-RX MU MIMO and ML-RX MU MIMO. 
The ZF MU SM-MIMO scheme removes multi-user interference with the nulling matrix and uses 
ZF precoding without perturbing the transmit signal as a precoding algorithm. The ZF-RX MU 
MIMO and the ML-RX MU MIMO schemes are the same as the proposed scheme and the ZF 
MU SM-MIMO scheme from the viewpoint of the usage of the nulling matrix to remove multi- 
user interference; however, these schemes have ZF and maximum likelihood (ML) decoders 
at the receiver to decode the transmit symbol, respectively [16]. Therefore, the receiver uses 
the coordination information or channel estimation to give the information about the effective 
channel as mentioned in Section HU We assume that the ZF-RX MU MIMO and the ML-RX 
MU MIMO schemes exploit perfect channel estimation at the receiver. Note that the precoded 
channel estimation method is not generally accepted due to the difficulty of designing pilots 
and preambles in the downlink channel [22]. The proposed scheme and the ZF MU SM-MIMO 
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scheme do not need to estimate precoded channel parameters, which are required in the ZF-RX 
MU MIMO and ML-RX MU MIMO schemes. In general, all pilot and preambles for channel 
estimation are not multiplied by any precoder since every user should monitor that pilot and 
preamble to be served in the near future. 

Fig. |7] shows bit error rate (BER) performance comparing the proposed scheme with the ZF MU 
SM-MIMO scheme for 4-QAM. ZF MU SM-MIMO is the same system as the channel inversion 
scheme mentioned above. We assume three {Nr, K} scenarios to observe the BER performance: 
{2, 2}, {2, 3}, and {3, 2}. The overall BER performance of the proposed scheme is better than 
that of the ZF MU SM-MIMO scheme. We have at least 10 dB SNR gain in the proposed 
system compared with the ZF MU SM-MIMO scheme at 1CT 2 BER. From the viewpoint of 
diversity gain, we also observe that the proposed system has full diversity gain because the 
proposed system modifies optimal decoders such as the maximum likelihood decoder in transmit 
precoding and also finds the perturbed symbol which has the optimal decision boundary in the 
Voronoi region to minimize the transmit power [15]. Therefore, among the BER curves of the 
proposed LB MU SM-MIMO schemes, the case with N R = 3 shows better diversity gain than 
the one with N R = 2. 

Fig. H] also shows BER performance comparing the proposed scheme with the ZF-RX MU 
MIMO and ML-RX MU MIMO schemes for 4-QAM. The proposed scheme supports 9 dB SNR 
gains at 10~ 2 BER compared with the ZF-RX MU MIMO scheme. Compared with the ML- 
RX MU MIMO scheme, the proposed scheme shows the same diversity gain, but provides less 
performance in SNR gain. Note that the ML-RX MU MIMO scheme requires perfect channel 
estimation for decoding the transmit symbol. On one hand, as long as perfect channel estimation 
is guaranteed at the receiver, ML decoding is the optimal solution to minimize BER. On the other 
hand, the proposed scheme shows comparable BER performance to the ML-RX MU MIMO 
scheme without any channel estimation. From the viewpoint of diversity gain, the proposed 
scheme has the same diversity order with the ML type receiver as mentioned earlier. 

We observe the performance in terms of diversity gain particularly in Fig. |9l The simulation 
results shows the BER performance in the case of {3,2}. We observe that the proposed scheme 
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has the same slope as the ML-RX MU MIMO scheme with the ML receiver and provides 
better diversity gain than the ZF MU SM-MIMO and ZF-RX MU MIMO schemes with linear 
precoding and decoding, respectively. 

C. Complexity 

In this section we calculate approximate complexities of all schemes mentioned in Section 

IEU 

It is hard to calculate exact complexity of the proposed scheme because the perturbing 
algorithm used in the proposed scheme adopts a scalar design parameter r that provides a 
symmetric encoding region around every signal constellation points (see (9) in [12]). Since the 
proposed perturbing algorithm uses a complex version of L k -dimensional integer-lattice least- 
square problem and we assume that N = L k = N R , the proposed scheme has an approximate 
complexity of 0(N M+a ) for each user, where M denotes a modulation order and a is a positive 
value that depends on the encoding region parameter r. The complexity of 0(N M+a ) is greater 
than 0(N M ) referred to as the complexity of ML decoding scheme with L k transmit data 
streams. Consequently, the proposed scheme has the complexity of 0(KN M+a ) (> 0(KN M )) 
totally at the transmitter. The receive complexity of the proposed scheme depends primarily on 
the modulo algorithm that simply demaps a perturbed symbol to an original symbol. Therefore, 
the proposed scheme has a complexity of O(N) for each user's receiver. 

The ZF-RX MU MIMO and the ML-RX MU MIMO schemes has no special encoding 
techniques at the transmitter, however, they use the ZF and the ML decoding algorithms at 
the receiver, respectively. Therefore the approximate complexity orders of ZF-RX MU MIMO 
and ML-RX MU MIMO schemes are 0(N W ) (2 < uo < 3, see [21]) and 0(N M ) for each user's 
receiver, respectively. 

We summarize the characteristics of the proposed scheme, ZF MU SM-MIMO, ZF-RX MU 
MIMO and ML-RX MU MIMO from the viewpoints of channel estimation and transmitter- 
receiver complexity in Table U 



February 1, 2008 



DRAFT 



18 



V. Conclusions 

We have proposed a lattice-based broadcast spatial multiplexing MIMO precoding scheme 
that supports multi-stream transmission without any coordination information. The proposed 
lattice-based multi-user precoder uses the block diagonalization scheme to remove multi-user 
interference and the channel inversion algorithm applied for the calculated effective channel as the 
precoding algorithm to avoid additional coordination information. It also exploits a perturbation 
to reduce transmit power enhancement resulting from the power normalization when channel 
inversion algorithms such as zero-forcing and minimum mean squared error algorithms are used 
for the precoding algorithm. The proposed scheme can achieve a sum rate approached by the 
block diagonalization scheme with the water-filling algorithm asymptotically for high signal to 
noise ratio without any coordination information. Through Monte Carlo simulations, we verified 
that the proposed scheme has at least 10 dB SNR gain compared with the zero-forcing multi- 
user MIMO precoding system with the block diagonalization at 1CT 2 BER. Also, the proposed 
scheme provided 9 dB SNR gain at 1(T 2 BER compared with the ZF-RX MU MIMO scheme 
which assumes perfect channel estimation and uses the zero-forcing algorithm at the receiver. 
Furthermore, we observed that the proposed system gets the full diversity gain, which also holds 
for the optimum decoding system and achieves the sum rate that the block diagonal scheme 
with water-filling algorithm asymptotically assuming that the elements of each user's channel 
are identically and independently distributed. We leave the reduced-complexity implementation 
of the proposed precoding scheme for future work. 
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TABLE I 

The system characteristic comparison between the proposed system, zf mu sm-mimo, zf-rx mu mimo and 

ml-rx mu mimo 





Channel estimation * 


Transmitter complexity 


Receiver complexity ** 


The Proposed Scheme 


No 


> 0{KN M ) 


0(N) 


ZF MU SM-MIMO 


No 


0{KN"),2< lu < 3 


0{N) 


ZF-RX MU MIMO 


Required 




0{N U ),2< lu<3 


ML-RX MU MIMO 


Required 




0(N M ) 



* This channel estimation means estimation for the effective channel at each mobile station. 

** This receiver complexity is required for each mobile station. 

*"** We assume that N = Nr = L k , and M denotes a modulation order. 
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Fig. 1. A MIMO broadcast system where a base station and each user has Nt transmit antennas and Nr receive antennas, 
respectively. 
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Fig. 2. The concept of perturbation. 
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Fig. 3. The structure of a lattice-based broadcast SM-MIMO precoding system using the block diagonalization algorithm. 
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Fig. 4. The comparison between the sum-capacity (C sum , [8]) and the achievable sum rates of the BD scheme with WF (Cbd, 
[19]), the channel inversion scheme (Rci) and the proposed scheme (R eU m)- Nt = 4, Nr = 2 and K — 2. 
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Fig. 5. The achievable sum rate performance according to the number of transmit streams per each user: K=2, (a) SNR=20dB, 
(b) SNR=30dB. 
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Fig. 6. The achievable sum rate performance according to the number of users: Nr = 2 and SNR=20dB. 



February 1, 2008 



DRAFT 



28 



10" 



10 



■ 



10* 

ct 

HI 

8> io" 3 

> 

< 

10" 



10" 



10 



- Proposed scheme Nr2 K2 
■ Proposed scheme Nr2 K3 

- Proposed scheme Nr3 K2 
-ZF-MU-MIMO Nr2 K2 
-ZF-MU-MIMO Nr2 K3 

ZF-MU-MIMO Nr3 K2 



10 15 20 

SNR per user[dB] 



25 



30 



Fig. 7. BER performance comparison between the proposed scheme and the ZF precoding scheme without perturbing for 
4-QAM. 
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Fig. 8. BER performance comparison between the proposed scheme and the other schemes ([16]) which have perfect additional 
channel estimation at receiver for 4-QAM. 
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Fig. 9. Diversity gain comparison. 
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